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Any method for RNA secondary structure prediction is determined by four ingredients. The Architecture is the choice 
of features implemented by the model (such as stacked basepairs, loop length distributions, etc.). The architecture 
determines the number of parameters in the model. The Scoring Scheme is the nature of those parameters (whether 
thermodynamic, probabilistic or weights). The Parameterization stands for the specific values assigned to the 
parameters. These three ingredients are referred to as "the model." The fourth ingredient is the Folding Algorithms 
used to predict plausible secondary structures given the model and the sequence of a structural RNA. Here, I make several 
unifying observations drawn from looking at more than 40 years of methods for RNA secondary structure prediction in 
the light of this classification. As a final observation, there seems to be a performance ceiling that affects all methods with 
complex architectures, a ceiling that impacts all scoring schemes with remarkable similarity. This suggests that modeling 
RNA secondary structure by using intrinsic sequence-based plausible "foldability" will require the incorporation of other 
forms of information in order to constrain the folding space and to improve prediction accuracy. This could give an 
advantage to probabilistic scoring systems since a probabilistic framework is a natural platform to incorporate different 
sources of information into one single inference problem. 



Introduction 

Methods for RNA secondary structure prediction based on 
thermodynamic parameters were already introduced in the 
1980s. 1 " 4 These still widely used thermodynamic methods owe 
their success to the incorporation of a large number of folding 
features (in addition to the standard basepairs), and to a care- 
fully crafted experimental estimation of those thermodynamic 
parameters. 5 " 12 The collection of thermodynamic parameters is 
usually referred to as the nearest-neighbor model of RNA fold- 
ing because it puts special emphasis on the thermodynamics of 
basepair correlations with their most adjacent bases (whether 
paired or unpaired). Indeed, their success has been such that 
more than 40 years later, the most widely used methods for RNA 
secondary structure prediction are thermodynamic, and not very 
different from the original ones. Representative examples are: 
Mfold/UNAFold, 13,14 ViennaRNA 15 ' 16 and RNAstructure. 10 ' 17 
Despite their durability, it has become apparent that the 
folding accuracy of the thermodynamic methods is relatively 
poor. 11 ' 18 " 20 

By the 1990s, probabilistic models were brought into 
the problem of RNA structure prediction. 21 " 24 Prior to these 
approaches, probabilistic models had been introduced for the 
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related problem of RNA homology detection using a consen- 
sus secondary structure, for which a "profile model" is built 
with position-specific scores. 25,26 All these early probabilistic 
methods used RNA secondary structure in combination with 
other sources of information (whether comparative analysis, 
covariation, alignments or others). For instance, the method 
Pfold 22,27 informs the likelihood of two positions being base- 
paired by looking at the pattern of covariation of those two 
positions in an input alignment. QRNA, in addition to using 
an input (pairwise) alignment to inform the likelihood of any 
two positions being basepaired, also uses the alignment to 
analyze the multiple-of-three patterns of mutations in order 
to infer the likelihood of the sequence being protein coding. 24 
The profiled probabilistic models for RNA homology use RNA 
structural covariation in combination with sequence conser- 
vation to improve homology detection. 28 Other probabilistic 
methods implemented afterward also integrate different forms 
of information. 25 

One limitation of thermodynamic models is that their param- 
eterization (i.e. selection of parameter values) is laborious, since 
it requires calorimetry measurements of many model RNA struc- 
tures. For instance, there are very good estimations of stacking 
free-energies for basepairs (that is, the free-energy of a basepair 
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as it stacks onto another contiguous pair), but other parameters 
such as those for multi-loop features have to be guessed due to 
lack of thermodynamic information. 

The labor-intensive nature of obtaining thermodynamic 
parameters is a motivation for exploring alternative approaches 
that can be trained statistically using structural data. We use the 
term "statistical" to describe all methods that train their param- 
eters using known RNA structures. Statistical methods can, 
in turn, be separated into probabilistic and "weights" methods 
depending on the nature of their parameters. Among statistical 
methods, probabilistic models have the additional advantage of 
being easy to train for arbitrary types of features. A seminal work 
in 2004 used this advantage in order to explore a collection of 
about 10 simple different model architectures using probabilistic 
scores. 30 Results showed that some models with about 20 param- 
eters perform surprisingly close to the standard thermodynamic 
models using thousands of parameters. 

Alternatively, statistical non-probabilistic "weights" meth- 
ods have been introduced in recent years. Examples are 
CONTRAfold, 3132 Simfold 33,34 and ContextFold. 35 These statis- 
tical methods use complex architectures similar to those of the 
thermodynamic models, and they seem to outperform thermo- 
dynamic methods. 36 

As late as 2009, all published probabilistic methods for RNA sec- 
ondary structure had been models with no more than 100 param- 
eters, while the contemporary statistical non-probabilistic models 
included complexities similar to those of the thermodynamic mod- 
els. This historical accident introduced an otherwise unfounded 
belief that complex models of RNA secondary structure could not 
be paired with probabilistic scoring schemes. 31 Since then, several 
efforts toward more complex probabilistic models of RNA second- 
ary structure have been presented. 37,38 Recently, explicit imple- 
mentations of probabilistic models expressing the same complex 
features as the thermodynamic models and more, while using a 
comparable number of parameters, has been presented. 39 

Probabilistic models, in addition to being comparable to other 
methods in the complexity of features they can incorporate, are 
useful for exploring the relative importance of different features 
of RNA secondary structure going beyond the complexities of the 
thermodynamic models. TORNADO, a compiler that can parse 
a wide range of RNA architecture, has explored into this space. 39 
The results were somewhat disappointing. Improvements can be 
found, but RNA folding accuracies using probabilistic models 
are just slightly above those of other statistical methods such as 
CONTRAfold. In addition, statistical methods with large num- 
ber of parameters are easy to overtrain, and the usual data sets that 
people use to train/test these methods 33,34,40 are quite vulnerable 
because of a lack of sufficient structural diversity within the data set. 

The literature, and even this brief historical review of it, may 
give the impression that those different methods (thermody- 
namic, statistical probabilistic or statistical using unconstrained 
weights) have little in common with each other. In this manu- 
script, I would like to show that they share fundamental prin- 
ciples, and that looking at what makes the methods similar (as 
opposed to different) helps us understand the overall problem, 
and suggests ways of moving forward. 



The Four Elements of an RNA 
Secondary Structure Prediction Algorithm 

The four elements necessary (and sufficient) to specify a single- 
sequence RNA secondary structure prediction method are: the 
architecture (or number of parameters), the scoring scheme 
(or nature of the parameters: thermodynamic, probabilistic or 
weights), the parameterization (or actual values of the param- 
eters) and the folding algorithms. A brief summary of these ele- 
ments is described in Figure 1. Next, I explore in some depth 
each of these four components, which will lead to some unifying 
observations. 

Up front, the observations are: (1) Any architecture for RNA 
secondary structure can be described in the form of a grammar in 
the Chomsky sense. 41 (2) Although historically it was believed that 
probabilistic scoring schemes could not be used for architectures 
with thousands of parameters, it has been shown that architec- 
tures of arbitrary complexity can be paired with all three scoring 
systems. (3) While the parameterization methods are specific for 
each scoring scheme, the folding algorithms are essentially identi- 
cal for all scoring types. (4) For all architectures tested, folding 
algorithms that take into account the whole ensemble of pos- 
sible structures outperform simpler "best structure" algorithms. 
This result holds true across all different scoring schemes. (5) 
For complex architectures, models using either trained probabili- 
ties or trained weights predict RNA secondary structures with 
higher accuracy than methods based solely on thermodynamic 
parameters. (6) Proper training and testing of methods for RNA 
secondary structure prediction with large numbers of parameters 
require using test sets with different structures (not just with dif- 
ferent sequences) from the training sets. The current data sets of 
structural RNAs lack sufficient structural diversity for a proper 
parameterization and testing of these complex methods. 

Architecture 

RNA secondary structure is defined by the hydrogen-bond inter- 
actions (in cis) between the Watson-Crick faces of two nucleo- 
tides located an arbitrary distance apart in the RNA backbone. 
RNA secondary structure basepairs are usually of the form A-U 
(U-A), C-G (G-C), and G-U (U-G), although other pairs occur 
at lower frequency. The Watson-Crick/Watson-Crick basepairs 
in cis are often referred to as the canonical basepairs. Other 
hydrogen-bond interactions involving other faces (there are three 
per nucleotide: Watson-Crick, Sugar or Hoogsteen) or conforma- 
tions (cis or trans) are oftentimes referred to as the non-canonical 
basepairs 42 and, in turn, they determine the tertiary structure of 
the molecule. 

Canonical basepairs usually occur in conjunction with other 
canonical basepairs forming short helices (or stems) that give 
stability to the molecule. RNA helices can get interrupted by 
unpaired nucleotides. Most RNA helices are nested within each 
other (that is, with no crossing basepairs). Independent heli- 
ces (or groups of nested helices) tend to aggregate next to each 
other in crystal structures, oftentimes stacked coaxially forming 
longer stems. However, a small fraction of basepairs appear in 
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Figure 1. Unified description of different methods for single-sequence RNA secondary structure prediction. The menu of elements that define a meth- 
od are: architecture, scoring scheme, parameterization and inference method. The architecture consists of the list of features which, in turn, determine 
the number of parameters of the model. The different architectures one can devise for a nested RNA secondary structure all fall into the category of 
a Context-Free Grammar (CFG). Any architecture can be implemented using either thermodynamic, weights or probabilistic parameters. Both weight 
and probabilistic schemes can be trained on data (statistical). There are statistical weight schemes such as CLLMs. Statistical probabilistic schemes for 
RNA folding are usually stochastic CFGs (SCFGs). Notice that SCFGs are a subset of CFGs. SCFGs describe models with a probabilistic scheme, while the 
concept CFG applies to all scoring schemes. The assignment of values for the parameters (parameterization) depends on the scoring scheme used. 
Thermodynamic models take values as kcal/mol free-energy estimations from experimental data. Conditional Log-Linear models use methods that 
require numerical optimization (CML and also online training). Probabilistic models are usually trained by maximum likelihood methods, which simply 
require obtaining frequencies of occurrences in the training set [and the addition of at least Laplace (+1) priors]. Once an architecture, scoring scheme 
and parameterization are in place (that is, a "model"), one can use different algorithms to infer plausible secondary structures. Unlike training, which is 
specific for the different scoring schemes, the folding algorithms (usually dynamic programming algorithms) are essentially identical for all parameter- 
izations (although oftentimes they have different names). A side note; the term "probabilistic" often leads to confusion. In the end, all scoring schemes 
(probabilistic or not) can give us insight into the probabilistic distribution of structures (tt) for a given sequence (s) (the so-called Boltzmann ensemble 
in a thermodynamic scheme). For instance, one can calculate the distribution's partition function (via the McCaskill or inside algorithms) or rigorously 
sample structures from that distribution. However, what is normally referred to as a "probabilistic" model is one in which the parameters of the model 
are themselves probabilities. Probabilistic models are "generative" models, which means that in addition to the Boltzmann ensemble per sequence, 
they also provide insight into the joint distribution for the ensemble of sequences and structures. With a probabilistic method, one can quite naturally 
generate sequences together with their structures according to the model. 



non-nested configurations named pseudoknots. In this review, 
I concentrate on methods for RNA secondary structure predic- 
tion leaving aside pseudoknots as well as tertiary interactions. 
Although one should not forget that it might be exactly pseu- 
doknots and tertiary interactions what could make the methods 
move forward and to obtain better prediction accuracies. 

An important advance was the realization that any nested 
(i.e. secondary structure) existing method for RNA folding 
could be represented as a context-free grammar (CFG), 41 and 
that RNA secondary structure prediction could be viewed as 
CFG parsing. 43 A CFG consists of non-terminals (NTs) (rep- 
resented with capital letters), terminals (the actual RNA bases, 
represented with lower case letters) and production rules of the 
form [NT -> (any combination of NTs/terminals)] . The pro- 
duction rules determine recursively the strings of RNA bases 
and structures that the grammar permits. 44 Grammar for RNA 
folding allow all possible strings of nucleotides (possibly with 
some restrictions in the secondary structures allowed), but they 
"weight" each string differently according to a scoring system 
that assigns values to the parameters of the grammar. Grammar 
parameters that provide scores for the actual nucleotides are 
named "emissions." Parameters that weight the different choices 
(rules) for a given non-terminal are named "transitions." I will 
discuss the different scoring schemes and how to assign actual 



values to the parameters in the next sections. Here, I concen- 
trate in the different CFG rules used to describe RNA second- 
ary structure. 

A production rule that represents the formation of a basepair 
is of the form (S ->• a S a) where a and a. stand for two paired 
bases. A scoring system for this one-rule grammar requires 
assigning 16 (or six) parameters whether one allows all pos- 
sible nucleotides pairs or just restricted (A-U, G-C, or G-U) 
basepairs. This "grammar" would produce a single infinitely 
long helix of all paired RNA bases. Not quite exactly what we 
want. 

A grammar that produces discontinuous helices with single- 
stranded bases connecting the stems, and generates independent 
as well as nested stems could have the form, 43 
S^aSalaSlSalSSle. 

This grammar has five rules, here separated by a | ("or" sym- 
bol). The fourth rule allows the possibility of multiple helices, 
and the fifth rule ends a string. The grammar allows one to intro- 
duce 16 (or six) basepair emissions, four single base emissions and 
five transitions one for each of the rules. 

The sequence of grammar rules necessary to produce a given 
RNA structure in named a derivation (or parse). A possible 
derivation under the above grammar for the toy stem "cacccug" 
(where nucleotides c-g and a-u are paired to each other) is 
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S => c S g => ca S ug => cac S ug => cacc S ug => caccc S ug 
=> caccc e ug. 

Double arrows (=>) are used for derivations, while single 
arrows (-►) are used for depicting the rules of the grammar. 

The above grammar serves the purpose of illustrating a 
simple architecture for RNA secondary structure prediction. 
However, it has some undesirable properties, mainly "ambigu- 
ity," which means that certain structures can be obtained in 
many different ways (parses) from the grammar. In an ambigu- 
ous grammar, one needs to be careful to consider the contribu- 
tions of all the possible parses in order to correctly calculate 
the weight (or probability) of a given structure. 45 In our toy 
example, one can easily see that the three unpaired c's could 
have been produced by any combination of the (S -► aS) or (S 
Sa) rules, leading (for large RNAs) to a combinatorial explosion 
of grammar parses representing the same structure. To avoid 
this potential complication, it is always convenient to work with 
unambiguous grammars (that is, grammars that guarantee one 
unique parse per structure). 

The Nussinov grammar, one of the first introduced for RNA 
folding 1 is unambiguous, and has the form 
S^SalSaSale. 

Another grammar also unambiguous with three instead of 
one non-terminal but similar number of emission parameters is 
the g6 grammar 30 first introduced with the method Pfold 22 

S -> L S | L # select left-most helix or base or 

# final helix or base 

L -> a F a | a # helix starts (emit first basepair) or 

# one single nucleotide emission 

F -> a F a | LS # helix continues (emit another basepair) or 

# hairpin, internal, and multifurcation loops 

The g6 grammar performs surprisingly better than the 
Nussinov and other similar grammars, when they all are trained 
on the same data. 30 Later, when we look at different scoring 
schemes for these two grammars, one can understand what 
causes this difference in performance. 

The standard thermodynamic nearest-neighbor model is 
more complex and realistic than the grammars presented so far. 7 
Each of the features of RNA secondary structure that the near- 
est-neighbor model scores can also be represented as a gram- 
mar. This is key to understanding how the thermodynamic 
parameters can be separated from the model architecture; and 
how parameter values other than thermodynamic free-energies 
could be used with the same "nearest-neighbor architecture." 

Notice that they may be fewer actual parameters used than 
parameters originally described by the CFG. Parameters for dif- 
ferent grammar rules can be tied together instead of consider- 
ing them all independent from each other. Selecting the effective 
number of parameters is a design choice often guided by avoiding 
an excessive number of free parameters. In Table 1, we report 
for each architecture the number of "free tied parameters," that 
is, independent parameters after tying equivalences have been 
established. 



Next comes a description of several relevant features of the 
nearest-neighbor architecture depicted as CFG rules. Theory 
suggested that RNA stability is more a function of "stacking" of 
basepairs than H-bond face interaction alone, thus "stacking" is 
inherent to the nearest-neighbor architecture. 5 ' 7 A rule that would 
produce stacked pairs is of the form 

stacked basepairs (S cc -> a S ai a), 

in which a basepair (a, a) depends on the contiguous base- 
pair (c, c). This rule is short notation for 16 (or six) rules, 
and each of them requires 16 (or six) parameters. 
Examples of other nearest-neighbor features implemented by 
the thermodynamic models are, 

left and right dangles (P c -> a F | F a), 

in which a single left (or right) base depends on the adjacent 
basepair. Here, P stands for a "paired" non-terminal and F for 
an arbitrary nonterminal. 

Basepairs depending on left and right dangles (P* ->■ a F a) 

(P^aFa), 

in which a basepair (a, a) depends on the contiguous unpaired 

bases (c), (d) or both. 

Another type of feature fundamental in the nearest-neigh- 
bor architecture that extends beyond the simpler grammars above 
is that of loop length distributions. 

Hairpin loops [P -► (m...m)], 

a closing rule that places a hairpin loop. This compact repre- 
sentation indicates a finite number of possible loops. There are 
many different ways to assign parameters to this rule. Below 
there are two examples, 

hairpin loops under independence assumption (P -> 

a 1 a 2 a 3 |a 1 a 2 a 4 |. . .[a^ . .a n ). 

This rule allows hairpin loops with lengths ranging from 
three to N, and their relative weights are determined by N-2 
transition parameters. All bases in a given hairpin loop are 
emitted independently according to a unique set of four 
parameters for the nucleotide emission. 
Hairpin with exactly four bases (tetraloops) (P -*■ aja 2 a 3 a 4 ) 
Special rule for tetraloops. This rule uses a set of 4 4 parameters 
to describe all possible tetraloops. 

Bulge loops (left and right) [P -> (m...m) F] [P - > F 

(m...m)]. 

Internal loops [P -> (m...m) F (m...m)], 

similar to the above but where the unpaired bases appear on 

both sides of the basepair. 

The nearest-neighbor model does also consider neighboring 
information for the loops, such as, 

tetraloops depending on closing basepair (P ,<! -> aja,a 2 a 4 ). 

Hairpin loops with exactly four bases depending on the 
closing basepair (c, c). 
Hairpin mismatches [P ,e — > a (m...m) b], 
in which the final two bases of a hairpin loop (a, b) depend on 
the closing basepair (c, c). 
Dangles in bulges [P ,c -*■ a(m...m) b F b], 
in which the end base (a) of a bulge depends on the 
adjacent basepair (c, c), and the closing basepair (b, b) 
depends on the adjacent bulge base. 
Internal loop mismatches [P* c -> a(d...) b ¥b{..A)e], 
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Table 1. Comparison of different methods for RNA secondary structure prediction 



Method 


Architecture 

# free tied parameters 


Scoring 
scheme 


Parameterization 


Training datasets 


Folding 
method 


Benchmark 

Set best F (%) 




(6 bps) (16 bps) 










TestSetA 


TestSetB 


g6 


11 21 


probabilistic 


ML 


TrainSetA+2*TranSetB 


c-MEA 


49.1 


47.5 


•basic grammar 


532 572 


probabilistic 


ML 


TrainSetA+2*TranSetB 


c-MEA 


56.9 


56.5 


"CONTRAfold v2.02 


-300 


weights 


CML 


S-Processed-TRA 


c-MEA 


57.2 


57.9 


•CONTRAfoldG 


1,278 5,448 


probabilistic 


ML 


TrainSetA+2*TranSetB 


c-MEA 


58.3 


58.6 


"UNAFold-3.8 


-3,500 


thermodynamic 


fit to exp. data 




CYK 


51.0 


51.3 


0< ;imfr>lrl Rl * 

Jl 1 T 1 1 UIU D L 


~QS ClboVG 


weights 


CML 


S-Processed-TRA 


CYK 


56.5 


55.3 


°RNAstructure v5.2 


-12,700 


thermodynamic 


fit to exp. data 




GCE 


53.5 


53.8 


ViennaRNA vl. 8.4 


-as above 


thermodynamic 


fit to exp. data 




GCE 


53.7 


54.3 


•ViennaRNAG 


14,307 90,497 


probabilistic 


ML 


TrainSetA+2*TranSetB 


c-MEA 


60.2 


59.4 


•ViennaRNAGz_bulge2_ 
ld_mdangle 


14,557 91,997 


probabilistic 


ML 


TrainSetA+2*TranSetB 


c-MEA 


60.5 


59.5 


'ContextFold vl.OO 


205,000 


weights 


online CML 


S-Full 


CYK 


64.4 


49.0 



Models. Models with a "0" are versioned stand-alone packages. Models with a "•" are CFGs (with alternative scoring schemes) introduced in refer- 
ence 39. In particular, ViennaRNAG is a CFG that when parameterized with thermodynamic scores reproduces the ViennaRNA v1.8.4 method, and 
CONTRAfoldG is another CFG that when parameterized with particular scores reproduces CONTRAfold v2.02. Here, we present the results of probabi- 
listic parameterizations for those grammars. Parameters. Methods are order by increasing number of parameters. Here we report the effective free 
parameters after tying. (The number of parameters for some of the native thermodynamic methods is only approximate and corresponds to two dif- 
ferent versions of the nearest-neighbor model). Test sets. TestSetA is a well curated collection of sequences from about 10 bona-fide RNA structures. 
TestSetB includes a collection of about 22 different RNA structure obtained from Rfam v10.0. TestSetA and TestSetB are structurally dissimilar, and they 
have been defined in reference 39. Performance accuracy. We use F (the harmonic mean of sensitivity and positive predictive value), such that an F 
of 100% would mean perfect prediction. Performance accuracy is calculated for the entire test set of sequences (instead of averaging the accuracy of 
each individual sequence). This "total" measures tend to be smaller than those obtained by averaging over sequences because it corrects for the (usu- 
ally abundant) small sequences in the test sets for which prediction is much easier than for longer sequences. For methods that use a MEA algorithm 
with a tunable parameter (both c-MEA 31 and GCE 36 ), this table report the "best F" in the ROC curve between sensitivity and positive predictive value 
(see ref. 39 for more details). Training sets. Provenance of training sets is as follows: TrainSetA+ 2*TrainSetB, 39 S-Processed-TRA, 33 S-Full. 34 



where for an internal loop limited by the two basepairs (c, 
c) and {b,b), the closing bases (a, e) depend on the adjacent 
basepair (c, c), and the basepair (b, ) ^depends on the adja- 
cent bases in the internal loop. 

A complete context-free grammar that implements all the fea- 
tures of the nearest-neighbor model can be found in reference. 39 
A simpler grammar that includes the basic features of those mod- 
els but with stacking, dangles and mismatches removed can be 
an instructive simplification. That simpler grammar, which can 
be viewed as a "scaffold" for the nearest-neighbor model, has the 



form 






s 


-+ a S | F S | e 


# Start: find a left base, or a left Helix 






# or End 


F 


a F a | a P a 


# Helix continues | Helix ends 


P 


-> m...m 


# Hairpin Loop 


P 


m...m F | F m...m 


# Left or Right Bulge 


P 


-»■ m...m F m...m 


# Internal Loop 


P 


-+M1 M 


# Multiloop two or more Helices 


M - 


* Ml R | R 


# One or more Helices 


Ml - 


+ aMl | F 


# One Helix, possibly with single left 






# unpaired bases 


R 


+ Ra|Mll 


# Last Helix, possibly with left and 






# right unpaired bases. 



This basic grammar has six non-terminals. Non-terminal "F" 
corresponds to a helix. Non-terminal "P" corresponds to all possible 
types of loops. The possible loop fates are: a hairpin loop, a left or 
right bulge continued by another helix, an internal loop also con- 
tinued by another helix or a multi-loop with possibly unpaired bases 
and including at least two more helices. This distinction between 
different types of loops is at the core of the nearest-neighbor model, 
and it is missing in the simpler g6 grammar described before. 

All these complex features were first introduced by the nearest- 
neighbor model and adopted right away by the thermodynamic 
methods. 3 At present, complex features have been explored by 
all possible scoring schemes. For example, the statistical method 
CONTRAfold uses an architecture that follows closely the near- 
est-neighbor model, but they maintain a relatively small number 
of free (tied) parameters in order to keep the training under con- 
trol, and in reference 39, CFGs mimicking the architectures of 
both CONTRAfold and ViennaRNA have been presented. 

Statistical methods both probabilistic and using real-val- 
ued weights have explored an even larger range of parameters 
than the nearest-neighbor model. For example, in the method 
ContextFold, higher than first-order Markov dependencies have 
been considered such as (see ref. 35 for more details), 

one or more unpaired bases depending on several other 

bases [p<-iW<-».<c-3)^ a F|a£ F], 
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where the numbers in the parentheses indicates position relative 
to the subsequence that is being emitted 

three single bases depending on two basepairs [pnw* 1 )./!- 2 ) 

In TORNADO, several additional features have been either 
tested such as, 

mismatches (or dangles) in multiloops 

where multi-loop bases contiguous to base- 
pairs depend on the closing basepairs, 
coaxial stacking (P — >■ a¥ a. b¥ b), 

where two contiguous stems with closing basepairs (a, 

a) and (i>,b), respectively, have their final basepair emissions 

depending on each other, 

stem length distributions (P -* a t F a, | a t a 2 F a^a, | ... | a r . . 

a N F a r ..a N ), 

and several others have been proposed and are allowed by the 
general grammar that parses different architectures (See ref. 39 
for more details and examples.) Models that incorporate a subset 
of three-base interactions have also been proposed. 46 

Notice that all architectures discussed here strictly apply to 
2D-nested RNA structures and, thus, are under the category 
of the Context-Free grammars. There are other elements of 
RNA structure that are important and add information to the 
2D-nested structures that have also been modeled computa- 
tionally. For instance, dynamic programming algorithms that 
incorporate pseudoknots of canonical basepairs (WC-WC in cis) 
exist. 47,48 These algorithm require architectures that do not fit 
under the Context-Free grammar category, and have much more 
demanding computational requirements. Folding methods that 
incorporate particular 3D RNA motifs also exist. 49 These RNA 
motifs involve exclusively non-canonical basepairs, which (like 
pseudoknots) are mostly non-nested interactions. Methods for 
the identification and discovery of new RNA motifs would be of 
great importance to refine RNA structure prediction. 

Scoring Scheme 

In the literature, there seems to be an almost automatic association 
between context-free grammars and probabilistic scoring schemes, 
but that is again a historical accident. In the previous section, I have 
tried to untie those two concepts. For our purpose, a context-free 
grammar is just a convenient framework to factor into independent 
terms the nested long-term interactions that occur in RNA second- 
ary structure regardless of the types of scores used. In this section, I 
will discuss three different scoring schemes that could be used with 
a given RNA architecture (i.e., CFG). 

A thermodynamic scheme assigns scores that are free energies 
(units of kcal/mol) to the emission and transition parameters. 
Many of the parameters, including stacking rules, are obtained 
by calculating equilibrium constants of small synthetic RNA 
oligos between paired and unpaired conformations by melting 
curves. There are some other parameters like the loop length dis- 
tribution that are obtained from polymer physics, and others that 
are just plainly guessed. There is a still active area of research 
trying to improve the values and architecture of thermodynamic 
parameters. 12 



In a probabilistic scheme, one considers the probability of pro- 
ducing a A-U basepair vs. the probability of any other basepair. 
Those probabilities could be selected by hand, for instance, P AU 
= P VA = P CG = P CG = 1/4 if we discard G-U/U-G pairs, and we 
assume all the others are equally likely. Normally, these probabil- 
ities are estimated from large numbers of known RNA structures. 
Technically in a probabilistic scheme, the transitions associated to 
a given non-terminal define a probability distribution. Similarly, 
for each rule, if there are nucleotides being emitted, an emission 
probability distribution needs to be defined. Probabilistic (sto- 
chastic) CFGs (SCFGs) were first suggested for single-sequence 
RNA secondary structure prediction in, 43 following from their 
earlier use in consensus structure prediction in reference 25. 

In a weighted scheme, one assigns a weight to producing a A-U 
basepair vs. the weights of any other basepair. Those weights 
could be selected by hand, for instance, the very first dynamic 
programming algorithm for RNA used a weighted scheme in 
which W Al] = W VA = W CG = W CG = but normally they are 
also trained from large numbers of known RNA structures. In 
a way, thermodynamic schemes are a particular case of weighted 
schemes in which the weights reflect the stability (lower weight 
more stability) of the different basepairs, so that W GC , CG < li^^ 
since A-U pairs have two H-bonds, but G-C pairs with three 
H-bonds are more stable. Technically in a weighted scheme, both 
the transition and emission parameters are real-valued numbers 
that do not need to obey any normalization constraint, nor do 
they have any thermodynamic interpretation. These methods 
include Conditional Random Fields (CRFs) and their general- 
ization Conditional Log-Linear models (CLLMs). CLLMs were 
first used for RNA in the method CONTRAfold. 31 

Both SCFGs and CLLMs are referred to as statistical meth- 
ods because they obtain the values of the parameters by learning 
them from known examples of RNA structures. Figure 1 gives 
some historical examples of all different scoring schemes. 

Methods that use thermodynamic parameters or weights 
(trained or not) are referred to as discriminative methods. 
Discriminative methods are those that calculate (or optimize) the 
probabilistic distribution of structures (it) for a given sequence 
(s) and a model M, P(ir s | s, M). Methods that use probabilis- 
tic parameters (as SCFGs) are referred to as generative methods. 
Generative methods calculate the joint distribution of sequences 
and structures given the model, P(s, ir | M). 

The decision about what kind of scoring scheme to use 
touches on an argument in the wider machine-learning field as to 
the relative virtues of discriminative vs. generative methods. 50 " 52 
The best scoring system does not necessarily need to be the same 
for all problems, nor does it have to be probabilistic for a specific 
problem. Regardless, many people, including myself, favor using 
probabilistic methods (in this case, SCFGs for RNA folding). 
The reasons for this bias are the following: (1) Probabilistic meth- 
ods, by being generative, allow one to do basic sanity checks such 
as "does this model produce synthetic structural RNAs that look 
anything like real structural RNAs?" (2) Probabilistic methods 
are ideal for combining different forms of information together, 
something that is an important consideration when trying to 
solve inference problems where RNA structure is only one piece 
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of the available information. 23 (3) Probabilistic models are easier 
to train than discriminative methods, as I will discuss in the next 
section. 

Another important reason for favoring a probabilistic scor- 
ing scheme is that one can get insight into relevant features of 
RNA folding that might have gone unnoticed (and improperly 
parameterized) under other scoring schemes. For instance, there 
is good correlation between the stacked basepair emission param- 
eters determined by the nearest-neighbor thermodynamic model 
and those estimated using statistical data. 53 But in addition to 
the stacked basepairs, there are other "structural" and long-range 
features of RNA secondary structure that only become apparent 
after mapping the nearest-neighbor thermodynamic model into 
a grammar architecture. Those additional parameters besides 
stacked basepairs (and other emission distributions well-char- 
acterized by the thermodynamic models such as mismatches, 
and loop distributions) are usually transition parameters in the 
grammar formalism. Examples are, the relative weights of the 
different types of loops (whether hairpin, bulge, internal loop or 
multi-loop), which correspond to the transition parameters of 
non-terminal P in the basic grammar introduced above. These 
transition parameters are usually hard to determine by thermo- 
dynamic experimentation with small synthetic oligonucleotides. 

Many of those transition parameters acquire geometric mean- 
ing under a probabilistic scheme. For instance, the g6 grammar 
performs significantly better than the Nussinov grammar when 
both are trained on data. 30 A probabilistic interpretation of the 
parameters of these two grammars tells us that the Nussinov 
grammar simply infers the fraction of unpaired vs. paired bases 
in the RNAs in the training set, by means of learning the counts 
of the transitions (S -* Sa) and (S -* SaSa), respectively. What 
makes the g6 grammar different from just specifying the relative 
proportions of paired vs. unpaired bases is the rule (L -> aFa), 
which gets used anytime a new helix starts. The generative ver- 
sion of g6 relates the rules' transitions [t {NT^> . . .)] to structural 
properties of the RNAs. For instance, the expected number of 
helices using g6 is given by 

t [ L^» aFa] 

tS^L tL^a - t[L^aFa] ' 

and the expected length of a helix is 

1 

1 - t[F -» aFa] ■ 

The substantially better performance of a probabilistic g6 
grammar suggests that these features, unaccounted for in the 
Nussinov grammar, are important. It would be interesting to 
use thermodynamic data to fit a regression for a thermodynamic 
parameterization of the g6 grammar. 

Parameterization 

Under a thermodynamic scoring scheme, the parameters of 
the model are determined by regression fit to a collection of 



calorimetry data. Under a probabilistic or weight scoring scheme, 
one could set arbitrary values for the parameters (for instance, 
as it was done for the first RNA folding algorithm 1 ), but most 
normally, the values of the parameters are obtaining by training 
on a set of known RNA secondary structures. Methods for RNA 
secondary structure that set the values of parameters by training 
are usually referred to as statistical methods. 

A probabilistic (or generative model, G) specifies the joint 
probability of a given RNA sequence s and a structure tt as a 
product of probabilities of independent features: 

P(s,Jt s \G) = Y[L Cais ^ . (l) 

a 

Here I assume that the model has only one non-termi- 
nal and one set of probabilistic parameters {t }, such that, 

a 

and where C (s,tt ) are the number of times that feature "a" has 
appear in structure tt. (The equation generalizes for architec- 
tures with an arbitrary number of non-terminals and probability 
distributions of parameters). 

A discriminative model (D) specifies the conditional probabil- 
ity of a given structure it given the sequence s, 

P(n, \s,D)= — ^= -= — ^ (2) 

Here W stand for the "score" associated to feature "a." 

a 

Discriminative methods include both thermodynamic and 
weighted scoring systems. Thus, the score W a could be either the 
ratio of a Gibbs free-energy change (measured in kcal/mol) and 
^7" (the product of the Boltzmann factor and temperature) or a 
real value without any thermodynamic interpretation. 

Notice that (disregarding the normalization factor for the dis- 
criminative conditional probability) there is an equivalence (when 
comparing Eqn. 1 with Eqn. 2) between the probabilistic param- 
eter t and the exponential of the discriminative score e Wa . This 
equivalence is why any folding algorithm can be applied to any 
scoring scheme (probabilistic, thermodynamic or using weights). 

One of the most widely used training methods for genera- 
tive (i.e. probabilistic) models is maximum likelihood (ML). ML 
estimation of the parameters {tj corresponds to optimizing for a 
training set of sequences (s) with their structure (it) the sum of 
all joint probabilities 2 ? log P (s, tt | G) under the probabilistic 
constraint 



The Lagrange optimization of the constrained ML expression 

X.X.M'-te('.*.)-*2f 0 

a 

for a Lagrange multiplier A., has the closed-form solution 
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that is, the ML estimation of probabilistic parameters corre- 
sponds to taking the total frequency of occurrence of features in 
the training set. 

ML parameterization alone is problematic when data are 
scarce, because some parameters will get zero values, thus over- 
fitting to the data. One usually adds priors to the ML estimates 
(for instance, a +1 pseudocount for each feature as in the Laplace 
priors). Estimations of probabilities adding priors correspond to 
a posterior mean estimate of the parameters under a Bayesian 
interpretation. 

ML optimization cannot be applied to a weights scoring 
scheme because those are discriminative methods which can 
only calculate the conditional probability of structures given a 
sequence. One of the most widely used training methods for dis- 
criminative methods is conditional maximum likelihood (CML). 
CML estimation of the weights is done by optimizing for a train- 
ing set the quantity 2 5 log / > (ir | s, D). There is not a closed-form 
solution for such optimization problem due to the normaliza- 
tion term in the denominator of Equation 2. One needs to use 
numerical optimization techniques to obtain the CML trained 
parameters, for example as in CONTRAfold. 31 In reference 34, 
the authors propose architectures that match different versions 
of the nearest-neighbor thermodynamic model, and then use the 
thermodynamic parameters as input in the optimization process. 
All these CML training methods are usually quite time consum- 
ing since they require global optimizations for the whole train- 
ing set. Alternative faster methods for CML training that do not 
require such global optimization ("online" training) have been 
devised, 54 ' 55 and applied for RNA structure prediction. 35 Using 
test sets with substantial structural diversity with respect to the 
training set suggests that the online training methods are par- 
ticularly vulnerable to overtraining (see Table 1). 

Obviously, a generative model can also be trained by CML 
since from the joint distribution in Equation 1 one can always 
calculate the conditional probability in Equation 2 (although not 
the other way around) . Not many results have been reported for 
this type of training for RNA probabilistic methods. Additionally, 
generative methods can be trained using expectation-maximi- 
zation (EM) techniques or Gibbs sampling and Markov chain 
Monte Carlo (MCMC) methods which do not require to know a 
trusted secondary structure of the RNA sequences in the training 
set. 43 EM training could be useful if there are not well-trusted 
structures for known structural RNAs. A few methods have 
reported implementing EM training for RNA folding. 21 ' 25 ' 56 

Selection of training sets. The success of a given training 
method depends critically on the selection of an adequate train- 
ing set. Some important limitations of the data sets of structural 
RNAs currently used for training have been recently pointed 
out. 39 

Early statistical methods like Pfold trained the parameters 
of the model from tRNAs (tRNAs) and large subunit rRNAs 
(LSU rRNA). The pre-QRNA and QRNA SCFGs were trained 



using tRNAs and small subunit rRNAs (SSU rRNA), 23 ' 24 and 
the simple grammars in reference 30 were trained using small 
and large subunit rRNAs. Notice that all those methods together 
account for only three different RNA structures used for train- 
ing. The advent of statistical models with larger number of 
parameters required using more structurally diverse training sets. 
For CONTRAfold, the authors devised a collection of 151 dif- 
ferent structures from Rfam (ST51Rfam). 31 This training set is 
structurally diverse but the number of actual sequences is small. 
CONTRAfold has a relatively small number of parameters 
(about 300), which made its training on S-151Rfam successful. 
When Andronescu et al. started training grammars with several 
thousands of parameters compatible with the nearest-neigh- 
bor thermodynamic model, they devised a large set of known 
RNA structures taken from different reliable sources. 57 " 62 The 
Andronescu data sets (which come under different names as they 
have evolved in time: S-Processed-TRA/TES, 33 RNA STRAND 
v2.0, 40 and S-Full-Train/Test 34 ) are currently the most widely 
used sets of RNA structures for training and testing performance 
of new methods. Still the Andronescu data set, while containing 
on the order of 3,000 sequences, covers only six different RNA 
structures: small and large subunit rRNAs, tRNAs, tmRNAs, 
ribonuclease P RNA and signal recognition particle RNAs. 

In order to benchmark methods for RNA secondary structure 
prediction, one needs to test performance on structural RNA 
sequences that are dissimilar to those in the training set not just 
by sequence, but also by structure. This fact has been under appre- 
ciated for some time, but it has become apparent when analyzing 
models with large number of parameters that have used subsets of 
the Andronescu data set both for training and testing. 39 In order 
to be able to train and test using structurally different data sets, 
reference 39 devised a new training/testing paradigm. The new 
data sets include: TrainSetA/TestSetA, which consists of a non- 
redundant version of the Andronescu data sets (containing six 
different structures), augmented with four other structures from 
reference 63 and TrainSetB/TestSetB structurally dissimilar to 
the previous, which consists of 22 different RNA structures col- 
lected from Rfam vlO.0. 

Most of the RNA structural diversity currently available comes 
from sequences collected from the Rfam database. Those struc- 
tures are problematic since they are oftentimes just predictions, 
and by construction they are consensus structures for an align- 
ment of the RNA family, not individual sequence structures. An 
advance for single-sequence RNA secondary structure prediction 
will likely come as the number of crystallized structurally differ- 
ent individual RNA structures increases, and training of these 
models improves. 

Folding Algorithms 

Unlike the training algorithms, which are specific for the different 
scoring schemes, the folding algorithms are essentially identical 
for all scoring schemes. Thus, one can talk about different struc- 
ture prediction algorithms independently of the scoring system. 
For example, in the generalized grammar method TORNADO 39 
the exact same C functions are used when an architecture is given 
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a thermodynamic parameterization that reproduces the results 
of the ViennaRNA package, and when the same architecture is 
given a probabilistic parameterization trained on a data set of 
RNA structures. One just needs to replace in the algorithms the 
thermodynamic scores with the logarithm of the corresponding 
probability parameters, or with the weights of a statistical dis- 
criminative method. 

The most widely used folding algorithms fall in the category 
of "dynamic programming" (DP) algorithms. These algorithms 
assume that the RNA structure is additive (which will be true 
for any CFG). Additivity allows the DP algorithms to achieve 
optimality with a computational complexity of 0(/ 3 ) in time and 
0(/ 2 ) in memory, for a sequence of length /. In principle, com- 
plex nearest-neighbor architectures can require 0(/ 4 ) in time, 
but under certain simplifying assumptions, 64 or by simply fixing 
a maximal length for internal loops (in most current implementa- 
tions 30 nucleotides), it is usually brought back to 0(/ 3 ), which 
is already quite taxing for everyday use with long sequences. 

The first dynamic programming algorithms for RNA second- 
ary structure prediction were introduced even before the advent 
of the nearest-neighbor model, 1,65 and they were almost immedi- 
ately applied to the nearest-neighbor model. 2,3 The prominence 
of DP algorithms is such that several specific programming lan- 
guages have been devised in order to facilitate the fast and auto- 
matic description of such algorithms for a variety of either RNA 
CFG architectures 39 and others. 66 " 69 

The main dynamic programming algorithms used in RNA 
secondary structure prediction are: 

The optimal structure: This dynamic programming cal- 
culates the best scoring structure given the sequence and the 
model. In a thermodynamic parameterization, it calculates the 
minimum free-energy structure, and it is referred to as the MFE 
algorithm. In a probabilistic implementation, it calculates the 
structure with the highest probability, and it is named the Cocke - 
Younger-Kasami (CYK) algorithm. 70 72 The MFE and CYK algo- 
rithms are essentially the same algorithm. 

The partition function or probability of the RNA: Given 
an RNA sequence, one can integrate the contribution of all struc- 
tures allowed by the model. In a thermodynamic parameteriza- 
tion requires assigning a Boltzmann factor to each structure, and 
summing them all into what is referred to as the partition func- 
tion. In a probabilistic implementation, it simply requires sum- 
ming the probabilities of all possible structures. 

The McCaskill algorithm is a DP algorithm that was intro- 
duced to effectively calculate the thermodynamic partition 
function, 73 and it is closely related to the inside (and outside) 
algorithms used for a SCFG to calculate the probability of the 
RNA summing to all possible structures. 74 ' 75 These algorithms 
have the same time and memory complexity as the simpler opti- 
mal structure algorithms namely, 0(/ 3 ) in time and 0(/ 2 ) in 
memory for a sequence of length /. 

Posterior probabilities of basepairs: Using the partition 
function or the inside/outside algorithms, one can calculate the 
posterior probabilities that any two bases in the sequence form 
a basepair. These basepair probabilities are named "posterior" 
because the basepair potential of any two bases is inferred by 



taking into account the overall folding potential of the rest of 
the RNA. 

Using the McCaskill/inside algorithm one can also sample 
structures from the Boltzmann ensemble or posterior distribution 
of structures given a sequence. This is useful for studying possible 
alternative structures for a given RNA. 76 For this problem, there 
is an algorithm that samples rigorously and exactly from the dis- 
tribution of structures. 43,77,78 Dynamic programming algorithms 
have also been proposed to calculate other posterior probabilities 
behond basepair interactions such as triplet interactions. 79 

Maximal expected accuracy (MEA) structure: Using the 
posterior probabilities of basepairs, one can infer a point estimate 
of a plausible structure such that it maximizes the total poste- 
rior probability of basepairs. This kind of methods were first 
introduced in sequence analysis for alignment algorithms, 80 later 
applied to hidden Markov models 81 and first used for RNA sec- 
ondary structure in the method Pfold. 22 

Since then, different MEA estimators have been implemented 
for RNA folding, either by maximizing the posterior probabilities 
of basepairs, 22,31 or calculating centroid estimators. 36,82 Reference 
21 optimizes 2 P.., where P.. the posterior probability of forming 
a basepair between positions i and/ Reference 31 optimizes the 
quantity, 



where P. is the posterior probability of position i being sin- 
gle stranded, and 7 a positive tunable parameter, I refer to 
this method as c-MEA (for CONTRAfold-MEA) . Reference 
36 optimizes the sum of posterior probabilities such that, 



which is a generalization of the centroid estimator in refer- 
ence 82, which corresponds to the particular case 7=1. These 
7-centroid estimators are often referred to as generalized cen- 
troid estimators (GCE). All MEA methods (except for the 
simpler 7 = 1 centroid), require to perform another dynamic 
programming calculation (with same time and memory require- 
ments than other RNA DP algorithms) to obtain the optimal 
point-estimate MEA structure from the distribution of all pos- 
sible structures. 

In terms of performance, all MEA methods perform some- 
what better than the simpler CYK method. Among all the MEA 
methods, those with a tunable parameter 31,36 are superior to 
those that do not have that feature. 22,82 Different MEA methods 
with a tunable parameter perform comparably to each other. 39 
These improvements of MEA over CYK algorithms occur con- 
sistently across all three different scoring systems, and have been 
documented for several different actual parameterizations. 36 
Currently, most RNA folding packages use a MEA estimator to 
predict RNA structures. 63 For that reason, when comparing the 
performance of different methods, it is important to use the same 
folding algorithm otherwise the differences will be reflecting the 
differences in the folding algorithms rather than the differences 
in the models' features. 



www.landesbioscience.com 



RNA Biology 



1193 



Discussion 

With all these possible methods with large number of parameters, 
alternative scoring schemes, different statistical parameteriza- 
tions, and more powerful folding algorithms, what is the state of 
the art for single-sequence RNA secondary structure prediction? 

Using methods like TORNADO 39 and others, 83 there has 
been an extensive exploration of RNA grammar architectures that 
go even beyond the complexities of the nearest-neighbor model. 
However, the performance of these more complex architectures 
and alternative scoring systems is not dramatically better than 
already existing ones. A method has also been proposed (named 
hierarchical Dirichlet process for SCFGs) in which the architec- 
ture of the model is not fixed a priori but optimally inferred in 
view of the training set. The idea is quite exciting, but results 
again seem to be comparable to those of other methods. 84 

A representative (but by no means comprehensive) summary of 
the current situation of different RNA single-sequence secondary 
structure prediction methods is given in Table 1. One observa- 
tion is that thermodynamic methods are outperformed by statistical 
methods (both probabilistic and weighted) with comparable num- 
ber of parameters. A comparison between methods ViennaRNA 
vl.8.4 and ViennaRNAG is a direct measure of the different per- 
formance of the same architecture under a thermodynamic or 
probabilistic parameterizations. Similarly, a comparison between 
CONTRAfold v2.02 and CONTRAfoldG is a direct measure of 
the differences between the same architecture under a weighted 
and probabilistic parameterization respectively. In both cases, 
the probabilistic parameterization performs better, but the differ- 
ences are not large (especially for the two CONTRAfold statisti- 
cal models). 

Another observation from Table 1 is that features that have 
been considered fundamental by the thermodynamical methods seem 
to have a small effect in performance accuracy. For instance, they 
model basic grammar while maintaining the same basic architec- 
ture than the nearest-neighbor model, does not include basepair 
stacking or mismatches, and the number of total parameters is 
significantly lower than in any thermodynamic implementa- 
tion. However, basic grammar under a probabilistic scoring sys- 
tem performs better than the thermodynamic methods tested 
in Table 1. It would be interesting to obtain a thermodynamic 
parameterization for this CFG. 

Models with complex architectures and using probabilistic scoring 
systems show the highest accuracy among all models tested. However, 
the best performance achieved is barely around 60% for the F mea- 
sure. (F is defined as the harmonic mean of sensitivity and posi- 
tive predictive value). There seems to be a barrier that prevents 
achieving higher performance without the risks of overfitting, as 
can be seen in Table 1 with regard to the method ContextFold, 
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when comparing its performance for both data sets (the data set 
used to train ContextFold is structurally similar to TestSetA). 
A dedicated effort to crystallize and to compile reliable second- 
ary structures for more diverse RNA structures would definitely 
improve the overfitting training problem for large architectures, 
but how much improvement that would produce remains to be 
seen. 

Single-sequence RNA secondary structure prediction is 
to some extent an exercise performed while holding one hand 
behind one's back. When possible, one should always use comple- 
mentary sources of information. 

Some leverage has come in recent years by incorporating 
experimental information such as nucleotide-resolution selec- 
tive hydroxyl acylation analyzed by primer extension (SHAPE) 
into methods for RNA folding. 85 ' 89 Another promising source of 
information consists of the emerging patterns of covariation for 
non-canonical types of basepairs. 90 There is a very active are of 
research about RNA tertiary motifs. 91 Current methods seem to 
go in the direction of profiling and cataloging the different types 
of motifs, 92 " 96 or can only be used to predict the structure of very 
small RNA sequences. 49 

I would advocate that at the moment, comparative analysis 
(at the structural/sequence level) is the most powerful method 
for characterizing new structural RNAs and inferring their 
structure. In the presence of a novel RNA, one can use any 
of the single-sequence methods to have a proposed secondary 
structure. Using close relatives, it is very likely these days to 
obtain similar sequences by sequence-only comparative analy- 
sis. In that situation, one should use the much more powerful 
profile structural comparative methods 28,97 in order to build a 
consensus structure for all homolog sequences. One can also 
use the profile model to search for other more distant can- 
didates, which in turn will help refining the structure of the 
RNA. Representative examples of this approach are given in 
references 98 and 99. 
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